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ABSTRACT 

In recent years, social networking platforms have developed 
into extraordinary channels for spreading and consuming in¬ 
formation. Along with the rise of such infrastructure, there 
is continuous progress on techniques for spreading informa¬ 
tion effectively through influential users. In many applica¬ 
tions, one is restricted to select influencers from a set of 
users who engaged with the topic being promoted, and due 
to the structure of social networks, these users often rank 
low in terms of their influence potential. An alternative ap¬ 
proach one can consider is an adaptive method which selects 
users in a manner which targets their influential neighbors. 
The advantage of such an approach is that it leverages the 
friendship paradox in social networks: while users are often 
not influential, they often know someone who is. 

Despite the various complexities in such optimization prob¬ 
lems, we show that scalable adaptive seeding is achievable. 
In particular, we develop algorithms for linear influence mod¬ 
els with provable approximation guarantees that can be grace¬ 
fully parallelized. To show the effectiveness of our methods 
we collected data from various verticals social network users 
follow. For each vertical, we collected data on the users who 
responded to a certain post as well as their neighbors, and 
applied our methods on this data. Our experiments show 
that adaptive seeding is scalable, and importantly, that it 
obtains dramatic improvements over standard approaches 
of information dissemination. 

Categories and Subject Descriptors 

H.2.8 [Database Management]: Database Applications— 
Data Mining ; F.2.2 [Analysis of Algorithms and Prob¬ 
lem Complexity]: Nonnumerical Algorithms and Prob¬ 
lems 
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Figure 1: CDF of the degree distribution of users who liked 
a post by Kiva on Facebook and that of their friends. 


1. INTRODUCTION 

The massive adoption of social networking services in re¬ 
cent years creates a unique platform for promoting ideas and 
spreading information. Communication through online so¬ 
cial networks leaves traces of behavioral data which allow 
observing, predicting and even engineering processes of in¬ 
formation diffusion. First posed by Domingos and Richard¬ 
son [7, 24] and elegantly formulated and further developed 
by Kempe, Kleinberg, and Tardos [ 3], influence maximiza¬ 
tion is the algorithmic challenge of selecting a fixed number 
of individuals who can serve as early adopters of a new idea, 
product, or technology in a manner that will trigger a large 
cascade in the social network. 

In many cases where influence maximization methods are 
applied one cannot select any user in the network but is 
limited to some subset of users. As an example, consider 
an online retailer who wishes to promote a product through 
word-of-mouth by rewarding influential customers who pur¬ 
chased the product. The retailer is then limited to select 
influential users from the set of users who purchased the 
product. In general, we will call the core set the set of users 
an influence maximization campaign can access. When the 
goal is to select influential users from the core set, the laws 
that govern social networks can lead to poor outcomes. Due 
to the heavy-tailed degree distribution of social networks, 
high degree nodes are rare, and since influence maximiza¬ 
tion techniques often depend on the ability to select high 
degree nodes, a naive application of influence maximization 
techniques to the core set can become ineffective. 

An adaptive approach. An alternative approach to spend¬ 
ing the entire budget on the core set is an adaptive two-stage 
approach. In the first stage, one can spend a fraction of the 
budget on the core users so that they invite their friends to 








participate in the campaign, then in the second stage spend 
the rest of the budget on the influential friends who hopefully 
have arrived. The idea behind this approach is to leverage 
a structural phenomenon in social networks known as the 
friendship paradox [9], Intuitively, the friendship paradox 
says that individuals are not likely to have many friends, 
but they likely have a friend that does (“your friends have 
more friends than you”). In Figure 1 we give an example of 
such an effect by plotting a CDF of the degree distribution 
of a core set of users who responded to a post on Facebook 
and the degree distribution of their friends. Remarkably, 
there are also formal guarantees of such effects. Recent work 
shows that for any network that has a power law degree dis¬ 
tribution and a small fraction of random edges, there is an 
asymptotic gap between the average degree of small samples 
of nodes and that of their neighbors, with constant proba¬ 
bility [17]. The implication is that when considering the 
core users (e.g. those who visit the online store) as random 
samples from a social network, any algorithm which can use 
their neighbors as influencers will have dramatic improve¬ 
ment over the direct application of influence maximization. 

Warmup. Suppose we are given a network, a random set of 
core users X and a budget k, and the goal is to select a sub¬ 
set of nodes in X of size t < k which has the most influential 
set of neighbors of size k — t. For simplicity, assume for now 
that the influence of a set is simply its average degree. If 
we take the k/2 highest degree neighbors of X, then surely 
there is a set S of size at most k/2 in X connected to this 
set, and selecting S would be a two-approximation to this 
problem. In comparison, the standard approach of influence 
maximization is to select the k highest degree nodes in X. 
Thus, standard influence maximization would have k of the 
most influential nodes in X while the approximation algo¬ 
rithm we propose has k/2 of the most influential nodes from 
its set of neighbors. How much better or worse is it to use 
this approach over the standard one? If the network has a 
power-law degree distribution with a small fraction of ran¬ 
dom edges, and influence is measured in terms of sum of de¬ 
grees of a set, then the results of [17] discussed above imply 
that the two-stage approach which allows seeding neighbors 
can do asymptotically (in the size of the network) better. 
Thus, at least intuitively, it looks as if two-stage approaches 
may be worth investigating. 

In this paper, our goal is to study the potential benefits 
of adaptive approaches for influence maximization. We are 
largely motivated by the following question. 

Can adaptive optimization lead to significant improvements 
in influence maximization ? 

To study this question we use the adaptive seeding model 
recently formalized in [26]. The main distinctions from the 
caricature model in the warmup problem above is that in 
adaptive seeding the core set X can be arbitrary (it does not 
have to be random), and every neighbor of X is assumed to 
arrive with some independent probability. These probabili¬ 
ties are used to model the uncertainty we have in that the 
neighbors would be interested in promoting the product, as 
they are not in the core set. The goal in adaptive seeding 
is to select a subset of nodes in X such that, in expectation 
over all possible arrivals of its neighbors, one can select a 
maximally influential set of neighbors with the remaining 


budget. 1 It is worth noting that using X to be the entire set 
of nodes in the network we get the Kempe-Kleinberg-Tardos 
model [13], and thus adaptive seeding can be seen as a gen¬ 
eralization of this model. 

Scalability. One of the challenges in adaptive seeding is 
scalability. This is largely due to the stochastic nature of the 
problem derived from uncertainty about arrival of neighbors. 
The main result in [26] is a constant factor approximation 
algorithm for well-studied influence models such as indepen¬ 
dent cascade and linear threshold which is, at large, a the¬ 
oretical triumph. These algorithms rely on various forms 
of sampling, which lead to a significant blowup in the in¬ 
put size. While such techniques provide strong theoretical 
guarantees, for social network data sets which are often ei¬ 
ther large or massive, such approaches are inapplicable. The 
main technical challenge we address in this work is how to 
design scalable adaptive optimization techniques for influ¬ 
ence maximization which do not require sampling. 

Beyond random users. The motivation for the adaptive 
approach hinges on the friendship paradox, but what if the 
core set is not a random sample? The results in [17] hold 
when the core set of users is random but since users who 
follow a particular topic are not a random sample of the 
network, we must somehow evaluate adaptive seeding on 
representative data sets. The experimental challenge is to 
estimate the prominence of high degree neighbors in settings 
that are typical of viral marketing campaigns. Figure 1 is a 
foreshadowing of the experimental methods we used to show 
that an effect similar to the friendship paradox exists in such 
cases as well. 

Main results. Our main results in this paper show that 
adaptive seeding is a scalable approach which can dramat¬ 
ically improve upon standard approaches of influence max¬ 
imization. We present a general method that enables de¬ 
signing adaptive seeding algorithms in a manner that avoids 
sampling, and thus makes adaptive seeding scalable to large 
size graphs. We use this approach as a basis for design¬ 
ing two algorithms, both achieving an approximation ratio 
of (1 — 1/e) for the adaptive problem. The first algorithm 
is implemented through a linear program, which proves to 
be extremely efficient over instances where there is a large 
budget. The second approach is a combinatorial algorithm 
with the same approximation guarantee which can be easily 
parallelized, has good theoretical guarantees on its running 
time and does well on instances with smaller budgets. The 
guarantees of our algorithms hold for linear models of in¬ 
fluence, i.e. models for which the influence of a set can be 
expressed as the sum of the influence of its members. While 
this class does not include models such as the independent 
cascade and the linear threshold model, it includes the well- 
studied voter model [12, 8] and measures such as node de¬ 
gree, click-through-rate or retweet measures of users which 
serve as natural proxies of influence in many settings [29]. 
In comparison to submodular influence functions, the rel- 


The model can be extended to the case where nodes take on different costs, 
and results we present here largely generalize to such settings as well. Although 
it seems quite plausible that the probability of attracting neighbors could de¬ 
pend on the rewards they receive, the model deliberately assumes unit costs, 
consistent with the celebrated Kempe-Kleinberg-Tardos model [13]. Of course, 
if the likelihood of becoming an early adopter is inversely proportional to one’s 
influence, then any influence maximization model loses substance. 



ative simplicity of linear models allows making substantial 
progress on this challenging problem. 

We then use these algorithms to conduct a series of ex¬ 
periments to show the potential of adaptive approaches for 
influence maximization both on synthetic and real social net¬ 
works. The main component of the experiments involved 
collecting publicly available data from Facebook on users 
who expressed interest (“liked”) a certain post from a topic 
they follow and data on their friends. The premise here is 
that such users mimic potential participants in a viral mar¬ 
keting campaign. The results on these data sets suggest 
that adaptive seeding can have dramatic improvements over 
standard influence maximization methods. 

Paper organization. We begin by formally describing the 
model and the assumptions we make in the following sec¬ 
tion. In Section 3 we describe the reduction of the adaptive 
seeding problem to a non-adaptive relaxation. In Section 4 
we describe our non-adaptive algorithms for adaptive seed¬ 
ing. In Section 5 we describe our experiments, and conclude 
with a brief discussion on related work. 

2. MODEL 

Given a graph G = ( V , E), for a node v £ V we denote by 
Af(v) the neighborhood of v. By extension, for any subset of 
nodes S CV, Af(S) = U„ 6 s-V(u) denote the neighbor¬ 
hood of S. The notion of influence in the graph is captured 
by a function / : 2^ I —> R + mapping a subset of nodes to 
a non-negative influence value. 

The adaptive seeding model. The input of the adaptive 
seeding problem is a core set of nodes X C V and for any 
node u £ AI(X) a probability p u that u realizes if one of 
its neighbor in X is seeded. We will write m = \X\ and 
n = |A/”(A')| the parameters controlling the input size. The 
seeding process is the following: 

1. Seeding: the seeder selects a subset of nodes S C X in 
the core set. 

2. Realization of the neighbors: every node it £ Af(S) re¬ 
alizes independently with probability p u ■ We denote 
by R C Af(S) the subset of nodes that is realized dur¬ 
ing this stage. 

3. Influence maximization: the seeder selects the set of 
nodes T C R that maximizes the influence function /. 

There is a budget constraint k on the total number of 
nodes that can be selected: S and T must satisfy |S| + \T\ < 
k. The seeder chooses the set S before observing the real¬ 
ization R and thus wishes to select optimally in expectation 
over all such possible realizations. Formally, the objective 
can be stated as: 

max Y, PR "“X f(T) 

~ flCA^(S) \T\<k-\S\ (1) 

s.t. |S| < k 

where pn is the probability that the set R realizes, pn = 

IUhP» IIueA/'(S)\i{(^ ~ P“)- 

It is important to note that the process through which 
nodes arrive in the second stage is not an influence process. 
The nodes in the second stage arrive if they are willing to 
spread information in exchange for a unit of the budget. 


Only when they have arrived does the influence process oc¬ 
cur. This process is encoded in the influence function and 
occurs after the influence maximization stage without incen- 
tivizing nodes along the propagation path. In general, the 
idea of a two-stage (or in general, multi-stage) approach is 
to use the nodes who arrive in the first stage to recruit influ¬ 
ential users who can be incentivized to spread information. 
In standard influence maximization, the nodes who are not 
in the core set do not receive incentives to propagate infor¬ 
mation, and cascades tend to die off quickly [28, 3, 10, 6]. 

Influence functions. In this paper we focus on linear (or 
additive) influence models: in these models the value of a 
subset of nodes can be expressed as a weighted sum of their 
individual influence. One important example of such models 
is the voter model [25] used to represent the spread of opin¬ 
ions in a social network: at each time step, a node adopts an 
opinion with a probability equal to the fraction of its neigh¬ 
bors sharing this opinion at the previous time step. For¬ 
mally, this can be written as a discrete-time Markov chain 
over opinion configurations of the network. In this model in¬ 
fluence maximization amounts to “converting” the optimal 
subset of nodes to a given opinion at the initial time so as 
to maximize the number of converts after a given period of 
time. Remarkably, a simple analysis shows that under this 
model, the influence function / is additive: 

vs c V, f(S) = J2 Vu ( 2 ) 

uES 

where w u ,u £ V are weights which can be easily computed 
from the powers of the transition matrix of the Markov 
chain. This observation led to the development of fast algo¬ 
rithms for influence maximization under the voter model [ 8 ] . 

NP-Hardness. In contrast to standard influence maxi¬ 
mization, adaptive seeding is already NP-Hard even for the 
simplest influence functions such as f(S) = |S| and when all 
probabilities are one. We discuss this in Appendix B. 

3. NON-ADAPTIVE OPTIMIZATION 

The challenging aspect of the adaptive seeding problem 
expressed in Equation 1 is its adaptivity: a seed set must 
be selected during the first stage such that in expectation 
a high influence value can be reached when adaptively se¬ 
lecting nodes on the second stage. A standard approach in 
stochastic optimization for overcoming this challenge is to 
use sampling to estimate the expectation of the influence 
value reachable on the second stage. However, as will be 
discussed in Section 5, this approach quickly becomes infea¬ 
sible even with modest size graphs. 

In this section we develop an approach which avoids sam¬ 
pling and allows designing adaptive seeding algorithms that 
can be applied to large graphs. We show that for addi¬ 
tive influence functions one can optimize a relaxation of the 
problem which we refer to as the non-adaptive version of 
the problem. After defining the non-adaptive version, we 
show in sections 3.1 that the optimal solution for the non- 
adaptive version is an upper bound on the optimal solution 
of the adaptive seeding problem. We then argue in Sec¬ 
tion 3.2 that any solution to the non-adaptive version of the 
problem can be converted to an adaptive solution, losing 
an arbitrarily small factor in the approximation ratio. To¬ 
gether, this implies that one can design algorithms for the 


non-adaptive problem instead, as we do in Section 4. 


Non-adaptive policies. We say that a policy is non- 
adaptive if it selects a set of nodes S C X to be seeded in 
the first stage and a vector of probabilities q £ [0, l] n , such 
that each neighbor u of S which realizes is included in the 
solution independently with probability q u . The constraint 
will now be that the budget is only respected in expectation, 
i.e. |S| + p T q < k. Formally the optimization problem for 
non-adaptive policies can be written as: 

sex e (n n (l-Puqu)^f(R) 

qe[0,i]" «CAT(X) uen uert(x)\R (3) 

s.t. |S| + p T q < k, q u < 1 {u £ Af(S)} 

where we denote by 1 {E} the indicator variable of the event 
E. Note that because of the condition q u < l{u £ 
the summand associated with R in (3) vanishes whenever 
R contains u £ J\f(X) \ Af(S). Hence, the summation is 
restricted to R C Af(S) as in (1). 

3.1 Adaptivity Gap 

We will now justify the use of non-adaptive strategies 
by showing that the optimal solution for this form of non- 
adaptive strategies yields a higher value than adaptive ones. 
For brevity, given a probability vector n £ [0, l] m we write: 

= e (nn (i—7r u ) j f(R) (4) 

HCAS(X) \u£R ueM(X)\R J 

as well as p®q to denote the component-wise multiplication 
between vectors p and q. Finally, we write Ta = {S C 
X : \S\ < k}, and T NA = {(5,q),|5| + p T q < k,q u < 
l{uejV(S)}} to denote the feasible regions of the adaptive 
and non-adaptive problems, respectively. 


Proposition 1. For additive functions given by (2), the 
value of the optimal adaptive policy is upper bounded by the 
optimal non-adaptive policy: 


max 

sex 


E P R 

RCAT(S) 


max f(T) < 

|T|<fe-|S| 


max i ? (p®q) 

sex ' 

96[0,l] n 


S.t. S' £ J-A 


S.t. (S, q) £ Fna 


with probability q u , to obtain a random set of nodes Ir C 
7?. nS. (S, q) being a non-adaptive solution, it could be that 
selecting Ir exceeds our budget. Indeed, the only guarantee 
that we have is that |S| + E[|/r|] < k. As a consequence, 
an adaptive solution starting from S might not be able to 
select Ir on the second stage. 

Fortunately, the probability of exceeding the budget is 
small enough and with high probability Ir will be feasible. 
This is exploited in [27] to design a randomized rounding 
method with approximation guarantees. These rounding 
methods are called contention resolution schemes. Theo¬ 
rem 1.3 of this paper gives us a contention resolution scheme 
which will compute from q and for any realization R a fea¬ 
sible set Ir, such that: 

E fl [/(/«)] >(l-£)F(q) (5) 

What this means is that starting from a non-adaptive so¬ 
lution (S', q), there is a way to construct a random feasible 
subset on the second stage such that in expectation, this set 
attains almost the same influence value as the non-adaptive 
solution. Since the adaptive solution starting from S will 
select optimally from the realizations R C Af(S), E _r[/(/r)] 
provides a lower bound on the adaptive value of S that we 
denote by A(S). 

More precisely, denoting by OPTa the optimal value of 
the adaptive problem (1), we have the following proposition 
whose proof can be found in Appendix A. 

Proposition 2. Let (S, q) be an a-approximate solution 
to the non-adaptive problem (3), then A(S) > aOPT^. 

4. ALGORITHMS 

Section 3 shows that the adaptive seeding problem re¬ 
duces to the non-adaptive problem. We will now discuss 
two approaches to construct approximate non-adaptive so¬ 
lutions. The first is an LP-based approach, and the second is 
a combinatorial algorithm. Both approaches have the same 
(1 — 1/e) approximation ratio, which is then translated to a 
(1 — 1/e) approximation ratio for the adaptive seeding prob¬ 
lem (1) via Proposition 2. As we will show in Section 5, 
both algorithms have their advantages and disadvantages in 
terms of scalability. 


The proof of this proposition can be found in Appendix A 
and relies on the following fact: the optimal adaptive policy 
can be written as a feasible non-adaptive policy, hence it 
provides a lower bound on the value of the optimal non- 
adaptive policy. 

3.2 From Non-Adaptive to Adaptive Solutions 

From the above proposition we now know that optimal 
non-adaptive solutions have higher values than adaptive so¬ 
lutions. Given a non-adaptive solution (S', q), a possible 
scheme would be to use S as an adaptive solution. But since 
(S, q) is a solution to the non-adaptive problem, Proposi¬ 
tion 1 does not provide any guarantee on how well S per¬ 
forms as an adaptive solution. 

However, we show that from a non-adaptive solution (S, q), 
we can obtain a lower bound on the adaptive value of S, that 
is, the expected influence attainable in expectation over all 
possible arrivals of neighbors of S. Starting from S, in every 
realization of neighbors R, sample every node u £ RCiAf(S) 


4.1 An LP-Based Approach 

Note that due to linearity of expectation, for a linear func¬ 
tion / of the form given by (2) we have: 


F(p)=E R [f(R)] =E« 


E ®«i{ilent 

ueAT(x) 


= E w uP[u £ R] = E Pv- W v. 

ueU(x) ueM(x) 


( 6 ) 


Thus, the non-adaptive optimization problem (3) can be 
written as: 

max V Pv.qu.Wu 
sax ' ^ 
qefo,!]™ ( x ) 

s.t. |Sj + p T q < k, q u < 1{« G A/S 1 )} 


The choice of the set S can be relaxed by introducing a 
variable A„ £ [0,1] for each v £ A'. We obtain the following 




LP for the adaptive seeding problem: 


max 

qG[0,l]" 

X6[0,l] m 


E 


Puq u w u 


ueAf(x) 


s.t. 


y, Xv + p T q < k, q u < y A„ 

v£X vEJ\f(u) 


(7) 


An optimal solution to the above problem can be found 
in polynomial time using standard LP-solvers. The solu¬ 
tion returned by the LP is fractional, and requires a round¬ 
ing procedure to return a feasible solution to our problem, 
where S is integral. To round the solution we use the pipage 
rounding method [2]. We defer the details to Appendix B.l. 


Lemma 1. For AdaptiveSeeding-LP defined in (7), any 
fractional solution (A, q) £ [0, l] m x [0,1]” can be rounded to 
an integral solution A £ {0, l} m s.t. (1 — l/e)J ? (poq) < A(A) 
in 0(m + n) steps. 


over the set T. Knowing the form of the optimal fractional 
solution, we can verify that d+Orufx} > d+Or and obtain: 

0(T U {x},c) - 0(T U {*},&) > 0{T,c ) - 0(T, b) 

Lemma 3. For any b £ R + , 0(T,b) is submodular in T, 
T C fif(X). 

The proof of this lemma is more technical. For T C Af(X) 
and x,y £ AT(X) \ T, we need to show that: 

0(T U {*}, b) - 0(T, b ) > 0(T U { y , as}, ft) — 0(T U { y }, b) 

This can be done by partitioning the set T into “high value 
items” (those with weight greater than w x ) and “low value 
items” and carefully applying Lemma 2 to the associated 
subproblems. The proof is in Appendix B.2. 

Finally, Lemma 3 can be used to show Proposition 3 whose 
proof can be found in Appendix B.2. 


4.2 A Combinatorial Algorithm 

In this section, we introduce a combinatorial algorithm 
with an identical approximation guarantee to the LP-based 
approach. However, its running time, stated in Proposition 5 
can be better than the one given by LP solvers depending 
on the relative sizes of the budget and the number of nodes 
in the graph. Furthermore, as we discuss at the end of this 
section, this algorithm is amenable to parallelization. 

The main idea is to reduce the problem to a monotone 
submodular maximization problem and apply a variant of 
the celebrated greedy algorithm [23]. In contrast to stan¬ 
dard influence maximization, the submodularity of the non- 
adaptive seeding problem is not simply a consequence of 
properties of the influence function; it also strongly relies on 
the combinatorial structure of the two-stage optimization. 

Intuitively, we can think of our problem as trying to find 
a set S in the first stage, for which the nodes that can be 
seeded on the second stage have the largest possible value. 
To formalize this, for a budget b £ R + used in the second 
stage and a set of neighbors T C Af(X), we will use 0(T, b) 
to denote the solution to: 

0(T, b) = max y p u quW u 

q6[0 ' 1] \emnT ( 8 ) 

s.t. p T q < b 

The optimization problem (3) for non-adaptive policies 
can now be written as: 

max 0(JV{S), k - |,S|) s.t. |S| < k (9) 

We start by proving in Proposition 3 that for fixed t, 
0(M{-),t) is submodular. This proposition relies on lem¬ 
mas 2 and 3 about the properties of 0{T , b). 

Lemma 2. Let T C N(X) and x £ AT(X), then 0(T U 
{x},b) — 0(T,b) is non-decreasing in b. 


Proposition 3. Let b £ R + , then 0(Af(S),b) is mono¬ 
tone and submodular in S, SC X. 

We can now use Proposition 3 to reduce (9) to a monotone 
submodular maximization problem. First, we note that (9) 
can be rewritten: 

max £)(A/’(S),f) s.t. |S|+t<fc (10) 

teH 

Intuitively, we fix t arbitrarily so that the maximization 
above becomes a submodular maximization problem with 
fixed budget t. We then optimize over the value of t. Com¬ 
bining this observation with the greedy algorithm for mono¬ 
tone submodular maximization [23], we obtain Algorithm 1, 
whose performance guarantee is summarized in Proposition 4. 


Algorithm 1 Combinatorial algorithm 

1: S <— 0 

2: for t = 1 to k — 1 do 
3: St <r- 0 

4: for i = 1 to k — t do 

5: x* £- argmaXj.gjf^ 0 ( J \ f ( S t U {x}),t) - 

oms t ),t) 

6: St <— St U {x*} 

7: end for 

8: if 0(JV(S t ),t) > 0(JV(S),k - |S|) then 

9: S <- St 

10: end if 

11: end for 
12: return S 


Proposition 4. Let S be the set computed by Algorithm 1 
and let us denote by A(S) the value of the adaptive policy 
selecting S on the first stage. Then A(S) > (1 — l/e)OPTA- 


The proof of this lemma can be found in Appendix B.2. 
The main idea consists in writing: 


0{T U (x},c) 


0(T U {x}, b) = d+0 T u{x}{t)dt 


where d+Or denotes the right derivative of 0(T, ■). For a 
fixed T and b, 0(T, b) defines a fractional Knapsack problem 


Parallelization. The algorithm described above considers 
all possible ways to split the seeding budget between the 
first and the second stage. For each possible split {{t,k — 
t)}t=i...,fc-i, the algorithm computes an approximation to 
the optimal non adaptive solution that uses k — t nodes in 
the first stage and t nodes in the second stage, and returns 
the solution for the split with the highest value (breaking 
ties arbitrarily). This process can be trivially parallelized 









across k — 1 machines, each performing a computation of a 
single split. With slightly more effort, for any e > 0 one can 
parallelize over log 1+e n machines at the cost of losing a fac¬ 
tor of e in the approximation guarantee (see Appendix B.3 
for details). 

Implementation in MapReduce. While the previous 
paragraph describes how to parallelize the outer for loop of 
Algorithm 1, we note that its inner loop can also be paral¬ 
lelized in the MapReduce framework. Indeed, it corresponds 
to the greedy algorithm applied to the function O (A 
The Sample&Prune approach successfully applied in [16] 
to obtain MapReduce algorithms for various submodular 
maximizations can also be applied to Algorithm 1 to cast 
it in the MapReduce framework. The details of the algo¬ 
rithm can be found in Appendix B.4. 

Algorithmic speedups. To implement Algorithm 1 effi¬ 
ciently, the computation of the argmax on line 5 must be 
dealt with carefully. 0(N{St U {*}),t) is the optimal solu¬ 
tion to the fractional Knapsack problem (8) with budget t 
and can be computed in time min( , n) by iterating over 
the list of nodes in M(St U {*}) in decreasing order of the 
degrees. This decreasing order of M(St.) can be maintained 
throughout the greedy construction of St by: 

• ordering the list of neighbors of nodes in X by decreas¬ 
ing order of the degrees when initially constructing 
the graph. This is responsible for a 0(n log n) pre¬ 
processing time. 

• when adding node x to St, observe that Af(St U {*}) = 
Af(St) UjV({i}). Hence, if Af(St) and A/”({a:}) are 
sorted lists, then 0{N{St U {as}), £) can be computed 
in a single iteration of length min(^^,n) where the 
two sorted lists are merged on the fly. 

As a consequence, the running time of line 5 is bounded 
from above by mmin( —,n). The two nested for loops are 
responsible for the additional k 2 factor. The running time 
of Algorithm 1 is summarized in Proposition 5. 

Proposition 5. Let pmin = min{p„,w £ A/”(.Y)}, then 
Algorithm 1 runs in time 0(n log n + k 2 m min(^—, n)). 

5. EXPERIMENTS 

In this section we validate the adaptive seeding approach 
through experimentation. Specifically, we show that our 
algorithms for adaptive seeding obtain significant improve¬ 
ment over standard influence maximization, that these im¬ 
provements are robust to changes in environment variables, 
and that our approach is efficient in terms of running-time 
and scalable to large social networks. 

5.1 Experimental setup 

We tested our algorithms on three types of datasets. Each 
of them allows us to experiment on a different aspect of the 
adaptive seeding problem. The Facebook Pages dataset that 
we collected ourselves has a central place in our experiments 
since it is the one which is closet to actual applications of 
adaptive seeding. 

Synthetic networks. Using standard models of social 
networks we generated large-scale graphs to model the so¬ 
cial network. To emulate the process of users following a 
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Figure 2: Comparison of the average degree of the core set 
users and the average degree of their friends. 

topic (the core set X) we sampled subsets of nodes at ran¬ 
dom, and applied our algorithms on the sample and their 
neighbors. The main advantage of these data sets is that 
they allow us to generate graphs of arbitrary sizes and ex¬ 
periment with various parameters that govern the structure 
of the graph. The disadvantages are that users who follow 
a topic are not necessarily random samples, and that so¬ 
cial networks often have structural properties that are not 
captured in generative models. 

Real networks. We used publicly available data sets of 
real social networks available at [20]. As for synthetic net¬ 
works, we used a random sample of nodes to emulate users 
who follow a topic, which is the main disadvantage of this 
approach. The advantage however, is that such datasets con¬ 
tain an entire network which allows testing different propa¬ 
gation parameters. 

Facebook Pages. We collected data from several Face- 
book Pages, each associated with a commercial entity that 
uses the Facebook page to communicate with its followers. 
For each page, we selected a post and then collected data 
about the users who expressed interest (“liked”) the post 
and their friends. The advantage of this data set is that it 
is highly representative of the scenario we study here. Cam¬ 
paigns run on a social network will primarily target users 
who have already expressed interests in the topic being pro¬ 
moted. The main disadvantage of this method is that such 
data is extremely difficult to collect due to the crawling re¬ 
strictions that Facebook applies and gives us only the 2- 
hop neighborhood around a post. This makes it difficult 
to experiment with different propagation parameters. For¬ 
tunately, as we soon discuss, we were able to circumvent 
some of the crawling restrictions and collect large networks, 
and the properties of the voter influence model are such 
that these datasets suffice to accurately account for influ¬ 
ence propagation in the graph. 

Data collection. We selected Facebook Pages in different 
verticals (topics). Each page is operated by an institution 
or an entity whose associated Facebook Page is regularly 
used for promotional posts related to this topic. On each 
of these pages, we selected a recent post (posted no later 
than January 2014) with approximately 1,000 likes. The set 
of users who liked those posts constitute our core set. We 
then crawled the social network of these sets: for each user, 
we collected her list of friends, and the degrees (number of 
friends) of these friends. 

Data description. Among the several verticals we col- 
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Vertical 

Page 

m 

n 

Charity 

Kiva 

978 

131334 

Travel 

Lonely Planet 

753 

113250 

Fashion 

GAP 

996 

115524 

Events 

Coachella 

826 

102291 

Politics 

Green Party 

1044 

83490 

Technology 

Google Nexus 

895 

137995 

News 

The New York Times 

894 

156222 

Entertainment 

HBO 

828 

108699 


Table 1: Dataset statistics, m: number of users in the core 
set, n: number of friends of core set users. 

lected, we select eight of them for which we will report our 
results. We obtained similar results for the other ones. Ta¬ 
ble 1 summarizes statistics about the selected verticals. We 
note that depending on the privacy settings of the core set 
users, it was not always possible to access their list of friends. 
We decided to remove these users since their ability to spread 
information could not be readily determined. This effect, 
combined with various errors encountered during the data 
collection, accounts for an approximate 15% reduction be¬ 
tween the users who liked a post and the number of users in 
the datasets we used. Following our discussion in the intro¬ 
duction, we observe that on average, the degrees of core set 
users is much lower than the degrees of their friends. This 
is highlighted on Figure 2 and justifies our approach. 

5.2 Performance of Adaptive Seeding 

For a given problem instance with a budget of k we applied 
the adaptive seeding algorithm (the combinatorial version). 
Recall from Section 2 that performance is defined as the 
expected influence that the seeder can obtain by optimally 
selecting users on the second stage, where influence is de¬ 
fined as the sum of the degrees of the selected users. We 
tested our algorithm against the following benchmarks: 

• Random Node (RN): we randomly select k users from 
the core set. This is a typical benchmark in comparing 
influence maximization algorithms [13]. 

• Influence Maximization (IM): we apply the optimal in¬ 
fluence maximization algorithm on the core set. This 
is the naive application of influence maximization. For 
the voter model, when the propagation time is polyno- 
mially large in the network size, the optimal solution 
is to simply take the k highest degree nodes [8]. We 
study the case of bounded time horizons in Section 5.5. 

• Random Friend (RF): we implement a naive two-stage 
approach: randomly select k/2 nodes from the core 
set, and for each node select a random neighbor (hence 
spending the budget of k rewards overall). This method 
was recently shown to outperform standard influence 
maximization when the core set is random [17]. 

5.3 Performance on Facebook Pages 

Figure 3 compares the performance of adaptive seeding, 
our own approach, to the afore-mentioned approaches for all 
the verticals we collected. In this first experiment we made 
simplifying assumptions about the parameters of the model. 
The first assumption is that all probabilities in the adaptive 
seeding model are equal to one. This implicitly assumes 
that every friend of a user who followed a certain topic is 
interested in promoting the topic given a reward. Although 
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Figure 4: Ratio of the performance of adaptive seeding to 
IM. Bars represents the mean improvement across all verti¬ 
cals, and the “error bar” represents the range of improvement 
across verticals. 

this is a strong assumption that we will revisit, we note that 
the probabilities can be controlled to some extent by the so¬ 
cial networking service on which the campaign is being run 
by showing prominently the campaign material (sponsored 
links, fund-raising banners, etc.). The second assumption is 
that the measure of influence is the sum of the degrees of 
the selected set. This measure is an appealing proxy as it 
is known that in the voter model, after polynomially many 
time steps, the influence of each node is proportional to its 
degree with high probability [ 8 ]. Since the influence pro¬ 
cess cannot be controlled by the designer, the assumption 
is often that the influence process runs until it stabilizes (in 
linear thresholds and independent cascades for example, the 
process terminates after a linear number of steps [13]). We 
perform a set of experiments for different time horizons in 
Section 5.5. 

It is striking to see how well adaptive seeding does in com¬ 
parison to other methods. Even when using a small budget 
(0.1 fraction of the core set, which in these cases is about 100 
nodes), adaptive seeding improves influence by a factor of at 
least 10, across all verticals. To confirm this, we plot the rel¬ 
ative improvements of adaptive seeding over IM in aggregate 
over the different pages. The results are shown in Figure 4. 
This dramatic improvement is largely due to the friendship 
paradox phenomenon that adaptive seeding leverages. Re¬ 
turning to Figure 3, it is also interesting to note that the RF 
heuristic significantly outperforms the standard IM bench¬ 
mark. Using the same budget, the degree gain induced by 
moving from the core set to its neighborhood is such that se¬ 
lecting at random among the core set users’ friends already 
does better than the best heuristic restricted only on the core 
set. Using adaptive seeding to optimize the choice of core set 
users based on their friends’ degrees then results in an order 
of magnitude increase over RF, consistently for all the pages. 

5.4 The Effect of the Probabilistic Model 

The results presented in Section 5.2 were computed as¬ 
suming the probabilities in the adaptive seeding model are 
one. We now describe several experiments we performed 
with the Facebook Pages data set that test the advantages 
of adaptive seeding under different probability models. 

Impact of the Bernouilli parameter. Figure 5a shows 
the impact of the probability of nodes realizing in the sec¬ 
ond stage. We computed the performance of adaptive seed¬ 
ing when each friend of a seeded user in the core set joins 
































Figure 3: Performance of adaptive seeding compared to other influence maximization approaches. The horizontal axis represents the 
budget used as a fraction of the size of the core set. The vertical axis is the expected influence reachable by optimally selecting nodes on 
the second stage. 



Figure 5: (a) Performance of adaptive seeding for various prop¬ 
agation probabilities, (b) Performance of adaptive seeding when 
restricted to the subgraph of users who liked HBO (red line). 

during the second stage independently with probability p, 
using different values of p. We call p the Bemouilli param¬ 
eter, since the event that a given user joins on the second 
stage of adaptive seeding is governed by a Bernouilli variable 
of parameter p. We see that even with p = 0.01, adaptive 
seeding still outperforms IM. As p increases, the performance 
of adaptive seeding quickly increases and reaches 80% of the 
values of Figure 3 at p = 0.5. 

Coarse estimation of probabilities. In practice, the 
probability a user may be interested in promoting a cam¬ 
paign her friend is promoting may vary. However, for those 
who have already expressed interest in the promoted con¬ 
tent, we can expect this probability to be close to one. We 
therefore conducted the following experiment. We chose a 
page (HBO) and trimmed the social graph we collected by 
only keeping on the second stage users who indicated this 
page (HBO) in their list of interests. This is a coarse es¬ 
timation of the probabilities as it assumes that if a friend 


follows HBO she will be willing to promote with probabil¬ 
ity 1 (given a reward), and otherwise the probability of her 
promoting anything for HBO is 0. Figure 5b shows that 
even on this very restricted set of users, adaptive seeding 
still outperforms IM and reaches approximately 50% of the 
unrestricted adaptive seeding. 

Impact of the probability distribution. In order to test 
scenarios where users have a rich spectrum of probabilities 
of realizing on the second stage. We consider a setting where 
the Bernouilli parameter p is drawn from a distribution. We 
considered four different distributions; for each distribution 
for fixed values of the budget and the parameter p, we tuned 
the parameters of the distribution so that its mean is exactly 
p. We then plotted the performance as a function of the 
budget and mean p. 

For the Beta distribution, we fixed /3 = 5 and tuned the a 
parameter to obtain a mean of p, thus obtaining a unimodal 
distribution. For the normal distribution, we chose a stan¬ 
dard deviation of 0.01 to obtain a distribution more concen¬ 
trated around its mean than the Beta distribution. Finally, 
for the inverse degree distribution, we took the probability 
of a node joining on the second stage to be proportional to 
the inverse of its degree (scaled so that on average, nodes 
join with probability p). The results are shown in Figure 6. 

We observe that the results are comparable to the one we 
obtained in the uniform case in Figure 5a except in the case 
of the inverse degree distribution for which the performance 
is roughly halved. Remember that the value of a user v on 
the second stage of adaptive seeding is given by p v d v where 
d v is its degree and p v is the its probability of realizing on the 
second stage. Choosing p v to be proportional to l/d v has 
the effect of normalizing the nodes on the second stage and 
is a strong perturbation of the original degree distribution 
of the nodes available on the second stage. 
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Figure 6: Performance of adaptive seeding as a function of the budget and the mean of the distribution from which the 
Bernouilli parameters are drawn. The details of the parameters for each distribution can be found in Section 5.4. 
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Figure 7: Performance of adaptive seeding compared to IM 
for the voter influence model with t steps. 


5.5 Impact of the Influence Model 

The Facebook Pages data set we collected is limited in 
that we only have access to the 2-hop neighborhood around 
the seed users and we use the degree of the second stage 
users as a proxy for their influence. As proved in [8], in 
the voter model, the influence of nodes converges to their 
degree with high probability when the number of time steps 
become polynomially large in the network size. 

In order to analyze the expected number of nodes influ¬ 
enced according to the voter model that terminates after 
some fixed number of time steps, we use publicly available 
data sets from [20] where the entire network is at our dis¬ 
posal. As discussed above, we sample nodes uniformly at 
random to model the core set. We then run the voter model 
for t time steps to compute the influence of the second stage 
users. Figure 7 shows the performance of adaptive seeding 
as a function of t compared to the performance of the IM 
benchmark. In this experiment, the budget was set to half 
the size of the core set. 

We see that the performance of adaptive seeding quickly 
converges (5 time steps for Slashdot, 15 time steps for Epin- 
ions). In practice, the voter model converges much faster 
than the theoretical guarantee of [8], which justifies using 
the degree of the second stage users as measure of influence 
as we did for the Facebook Pages data sets. Furthermore, 
we see that similarly to the Facebook data sets, adaptive 
seeding significantly outperforms IM. 

5.6 Performance on Synthetic Networks 

In order to analyze the impact of topological variations 
we generated synthetic graphs using standard network mod¬ 
els. All the generated graphs have 100,000 vertices, for each 


model, we tuned the generative parameters to obtain when 
possible a degree distribution (or graph density otherwise) 
similar to what we observed in the Facebook Pages data sets. 

• Barabasi-Albert: this well-known model is often used 
to model social graphs because its degree distribution 
is a power law. We took 10 initial vertices and added 10 
vertices at each step, using the preferential attachment 
model, until we reached 100,000 vertices. 

• Small-World: also known as the Watts-Strogatz model. 
This model was one of the first models proposed for 
social networks. Its diameter and clustering coeffi¬ 
cient are more representative of a social network than 
what one would get with the Erdos-Renyi model. We 
started from a regular lattice of degree 200 and rewired 
each edge with probability 0.3. 

• Kronecker: Kronecker graphs were more recently in¬ 
troduced in [18] as a scalable and easy-to-fit model for 
social networks. We started from a star graph with 
4 vertices and computed Kronecker products until we 
reached 100,000 nodes. 

• Configuration model: The configuration model allows 
us to construct a graph with a given degree distribu¬ 
tion. We chose a page (GAP) and generated a graph 
with the same degree distribution using the configura¬ 
tion model. 

The performance of adaptive seeding compared to our bench¬ 
marks can be found in Figure 8. We note that the improve¬ 
ment obtained by adaptive seeding is comparable to the one 
we had on real data except for the Small- World model. This 
is explained by the nature of the model: starting from a reg¬ 
ular lattice, some edges are re-wired at random. This model 
has similar properties to a random graph where the friend¬ 
ship paradox does not hold [ 7]. Since adaptive seeding is 
designed to leverage the friendship paradox, such graphs are 
not amenable to this approach. 

5.7 Scalability 

To test the scalability of adaptive seeding we were guided 
by two central questions. First, we were interested to wit¬ 
ness the benefit our non-sampling approach has over the 
standard SAA method. Secondly, we wanted to understand 
when one should prefer to use the LP-based approach from 
Section 4.1 over the combinatorial one from Section 4.2. The 
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Figure 8: Performance of adaptive seeding on synthetic net¬ 
works. 


computations in this section were run on Intel Core i5 CPU 
4x2.40Ghz. For each computation, we plot the time and 
number of CPU cycles it took. 

Comparison with SAA. The objective function of the 
non-adaptive problem (3) is an expectation over exponen¬ 
tially many sets, all possible realizations of the neighbors in 
the second stage. Following the sampling-based approach 
introduced in [26], this expectation can be computed by av¬ 
eraging the values obtained in O (n 2 ) independent sample 
realizations of the second stage users (n is the number of 
neighbors of core set users). One important aspect of the al¬ 
gorithms introduced in this paper is that in the additive case, 
this expectation can be computed exactly without sampling, 
thus significantly improving the theoretical complexity. 

In Figure 9, we compare the running time of our combina¬ 
torial algorithm to the same algorithm where the expecta¬ 
tion is computed via sampling. We note that this sampling- 
based algorithm is still simpler than the algorithm intro¬ 
duced in [26] for general influence models. However, we 
observe a significant gap between its running time and the 
one of the combinatorial algorithm. Since each sample takes 
linear time to compute, this gap is in fact 0(n 3 ), quickly 
leading to impracticable running times as the size of the 
graph increases. This highlights the importance of the sans- 
sampling approach underlying the algorithms we introduced. 

Combinatorial vs. LP algorithm. We now compare the 
running time of the LP-based approach and the combinato¬ 
rial approach for different instance sizes. 

Figure 10 shows the running time and number of CPU 
cycles used by the LP algorithm and the combinatorial al¬ 
gorithm as a function of the network size n. The varying 
size of the network was obtained by randomly sampling a 
varying fraction of core set users and then trimming the so¬ 
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Figure 9: Running time and number of CPU cycles used by 
the sampling-based algorithm and the combinatorial adap¬ 
tive seeding algorithm for different sizes of the core set. 
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Figure 10: Running time and number of CPU cycles of the 
combinatorial algorithm and the LP algorithm as a function 
of the number of nodes n. First row with budget k = 100, 
second row with budget k = 500. 

cial graph by only keeping friends of this random sample on 
the second stage. The LP solver used was CLP [1], 

We observe that for a small value of the budget k (first 
row of Figure 10), the combinatorial algorithm outperforms 
the LP algorithm. When k becomes large (second row of 
Figure 10), the LP algorithm becomes faster. This can be 
explained by the k 2 factor in the running time of the com¬ 
binatorial algorithm (see Proposition 5). Even though the 
asymptotic guarantee of the combinatorial algorithm should 
theoretically outperform the LP-based approach for large n, 
we were not able to observe it for our instance sizes. In prac¬ 
tice, one can choose which of the two algorithms to apply 
depending on the relative sizes of k and n. 


6. RELATED WORK 

Influence maximization was introduced by Domingos and 
Richardson [7, 24], formulated by Kempe, Kleinberg and 
Tardos [13, 1 ], and has been extensively studied since [22, 
5, 19, 22, 5, 21, ]. The main result in [13, 14] is a char¬ 
acterization of influence processes as submodular functions, 
which implies good approximation guarantees for the influ- 






















































ence maximization problem. In [8], the authors look at the 
special case of the voter model and design efficient algo¬ 
rithms in this setting. 

Our two-stage model for influence maximization is related 
to the field of stochastic optimization where problems are 
commonly solved using the sample average approximation 
method [15]. Golovin and Krause [11] study a stochastic 
sequential submodular maximization problem where at each 
step an element is chosen, its realization is revealed and the 
next decision is made. We note that contrary to adaptive 
seeding, the decision made at a given stage does not affect 
the following stages as the entire set of nodes is available as 
potential seeds at every stage. 
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APPENDIX 

A. ADAPTIVITY PROOFS 

Proof of Proposition 1. We will first show that the 

optimal adaptive policy can be interpreted as a non-adaptive 

policy. Let S be the optimal adaptive solution and define 

Sr : A/pf) {0,1}: 

1 if u € argmax{/(T); T C R, |T| < k — |S|} 

0 otherwise 


one can write 


E p r “s /( T ) = E Pr Y^ 5 r(u)w u 

RCJiT(S) |T|<fe-|S| RCM(S) u£M(X) 

= E Wu E PrSr( u )- 

u<=N'(X) R<ZU(S) 

Let us now define for u £ Af(X): 


J2rcM(S) ™d R ( u ) 

0 


if Pu ^ 0 
otherwise 


This allows us to write: 

E PR max f(T) = Y PuquWu = F( p o q) 

-RCJV(S) | T | <fc — | S'! ueM(X) 


where the last equality is obtained from (4) by successively 
using the linearity of the expectation and the linearity of /. 

Furthermore, observe that q u £ [0,1], q u = 0 if u ^ A/”(5) 
and: 

|5| + Y Pu<iu = \s\+ Y pr E Sr ( u ) 

ueM(x) rcM(S) ueAT(x) 

<|S|+ Y PR( k -\S\)< k 

RCJif(S) 


Hence, (S, q) £ J~na- In other words, we have written 
the optimal adaptive solution as a relaxed non-adaptive so¬ 
lution. This conclude the proof of the proposition. □ 


Proof of Proposition 2. Using the definition of A(5'), 
one can write: 

a (s)= y p r /cn> E p* E [/(/)] 

RCJsf(S) |T|<X-|S| RCM(S) 

where the inequality comes from the fact that I is a feasible 
random set: \I\ < k — IS 1 !, hence the expected value of /(/) 
is bounded by the maximum of / over feasible sets. 

Equation (5) then implies: 

A(S)>(l-e) Y P« i? ( { l) = (! -e) E (P°q)- (H) 

RCAT(S) 

Equation (11) holds for any e > 0. In particular, for e 
smaller than infg^T |A(S) — A(T)\, we obtain that A(5) > 
F(p°q). Note that such a e is at most polynomially small 
in the size of the instance. (S , q) is an a-approximate non 
adaptive solution, hence F(poq) > oOPTjva. We can then 
conclude by applying Proposition 1. □ 

B. ALGORITHMS PROOFS 

We first discuss the NP-hardness of the problem. 
NP-Hardness. In contrast to standard influence maxi¬ 
mization, adaptive seeding is already NP-Hard even for the 
simplest cases. In the case when f(S) = |S| and all prob¬ 
abilities equal one, the decision problem is whether given a 
budget k and target value i there exists a subset of X of size 
k — t which yields a solution with expected value of £ using 
t nodes in AT(X). This is equivalent to deciding whether 
there are k — t nodes in X that have t neighbors in AT(X). 
To see this is NP-hard, consider reducing from Set-Cover 
where there is one node i for each input set Ti, 1 < i < n, 


with A f(i) = Ti and integers k, t, and the output is “yes” if 
there is a family of k sets in the input which cover at least 
l elements, and “no” otherwise. 

B.l LP-based approach 

In the LP-based approach we rounded the solution using 
the pipage rounding method. We discuss this with greater 
detail here. 

Pipage Rounding. The pipage rounding method [2] is 
a deterministic rounding method that can be applied to 
a variety of problems. In particular, it can be applied to 
LP-relaxations of the Max-K-Cover problem where we are 
given a family of sets that cover elements of a universe and 
the goal is to find k subsets whose union has the maximal 
cardinality. The LP-relaxation is a fractional solution over 
subsets, and the pipage rounding procedure then rounds the 
allocation in linear time, and the integral solution is guar¬ 
anteed to be within a factor of (1 — 1/e) of the fractional 
solution. We make the following key observation: for any 
given q, one can remove all elements in Af(X) for which 
q u = 0, without changing the value of any solution (A, q). 
Our rounding procedure can therefore be described as fol¬ 
lows: given a solution (A, q) we remove all nodes u £ N(X) 
for which q u = 0, which leaves us with a fractional solu¬ 
tion to a (weighted) version of the Max-K-Cover problem 
where nodes in X are the sets and the universe is the set 
of weighted nodes in j\f(X) that were not removed. We can 
therefore apply pipage rounding and lose only a factor of 
(1 — 1/e) in quality of the solution. 

B.2 Combinatorial Algorithm 

We include the missing proofs from the combinatorial al¬ 
gorithm section. The scalability and implementation in MapRe¬ 
duce are discussed in this section as well. 


Proof of Lemma 2. W.l.o.g we can rename and order 
the pairs in T so that wi > W 2 > ... > w m ■ Then, 0(T,b) 
has the following simple piecewise linear expression: 


’ bw i 


0(T,b) = < 


^2pk(w k - Wi) + bwi 
k =1 

m 

Y pkWk 


if 0 < b < pi 

if 0 < b - Y; Pk < Pi 

k =1 

m 

if b>Y,Pk 

i =1 


Let us define for t £ R + , n(t) = inf j i s.t. J2k =i Pk > £ j 
with n{t ) = +oo when the set is empty. In particular, note 
that x i-+ n(t) is non-decreasing. Denoting d+Or the right 
derivative of 0(T,-), one can write d+Or(t) = w n (t), with 
the convention that Woo = 0. 

Writing i = sup s.t. Wj > Waij-, it is easy to see that 
d+C>ru{x} > d+Or- Indeed: 

1. if n(t) < i then d+0 T u{x}(t) = d+0 T {t) = w n(t) . 

2. if n(t) > i + 1 and n(t — c) < i then d+Oru{x}{t ) = 
Wx > w n ( t ) = d+0 T (t). 

3. if n(t — c) >i + 1, then d+0 TU{x y = w n ( t - c ) > «>n(t) = 
d+0 T {t). 

Let us now consider b and c such that b < c. Then, using 
the integral representation of 0(T U {m}, •) and 0(T, •), we 



get: 


0(T U {x}, c) - 0(T U {x}, b ) = J d+0 TU{x} {t)dt 

> d+0 T {t)dt = 0(T,c) - 0(T,b) 

Re-ordering the terms, 0(T U (x},c) — 0(T,c) > 0(T U 
{x},b)—0 ( T , b) which concludes the proof of the lemma. □ 

Proof of Lemma 3. Let T C Af(X) and x,y € Af(X) \ 
T. Using the second-order characterization of submodular 
functions, it suffices to show that: 

0(T U {a;}, b) - 0(T, b ) > 0(T U {y, x}, b) - C(T U {y},b) 

We distinguish two cases based on the relative position of 
w x and w y . The following notations will be useful: S = 
{ueT s.t. w x < »«} and Pf = T \ St- 
Case 1: If w y > w x , then one can write: 

0(T U {y, x}, b) = 0(P y U {y}, 61 ) + 0(S y T U {x}, b 2 ) 
0(T U {y}, b) = 0(P y U{y}, 6 i) + 0(S* , b 2 ) 

where bi is the fraction of the budget b spent on Plj. U {y} 
and b 2 = b — bi. 

Similarly: 

0(T U {x}, b) = 0(1%, ci) + 0(S y T U {x}, c 2 ) 

0(T,b) = 0(P y ,c 1 ) + 0(S y ,c 2 ) 

where ci is the fraction of the budget b spent on P y and 
c 2 = b — Cl. 

Note that bi > ci: an optimal solution will first spent as 
much budget as possible on P^ U{y} before adding elements 
in St U {x}. 

In this case: 

0(T U {x}, b) - 0(T , b ) = 0(S y T U {x}, c 2 ) + 0(S y T , c 2 ) 

> 0(S y U{x},b 2 )+0(S y .,b 2 ) 

= 0(TU{y,x},b)-0(Tu{y},b) 

where the inequality comes from Lemma 2 and c 2 > b 2 . 

Case 2: If w x > w y , we now decompose the solution on 
Pf and Sj-- 

0(T U {x}, b) = 0(1% U {x}, bi) + 0(S£, b 2 ) 

0(T, b) = 0 (Pt, ci) + 0(St, c 2 ) 

0(T U {y, x}, b) = 0(1% U {x}, bi) + 0(S X T U {y}, b 2 ) 
0(T U {y}, b) = 0(1%, ci) + 0(S X T U {y}, c 2 ) 

with bi + b 2 = b, ci + c 2 = b and b 2 < c 2 . 

In this case again: 

0(T U {x}, b) - 0(T, b) = 0(S%, b 2 ) - 0(S%-, c 2 ) 

> 0(5? U {y}, ba) - 0(S%- U {y},c 2 ) 

= 0(T U {y, x}, b) — 0(T U {y}, b) 

where the inequality uses Lemma 2 and c 2 > b 2 . 

In both cases, we were able to obtain the second-order 
characterization of submodularity. This concludes the proof 
of the lemma. □ 

Proof of Proposition 3. Let us consider S and T such 
that S C T C X and x € X \ T. In particular, note that 
JV(S) C JV(T). 


Let us write JV(S U {x}) = A f(S) U R with Af(S) n R = 0 
and similarly, Af(TU {x}) = Af(T)UR' with A f(T)C\R' = 0. 
It is clear that R' C R. Writing R' = {ui, ...,Uk}' 

0(M(Tu{x}),b)-0(M(T),b) 

k 

= Y, 0(Af(T) u{ui, ...Ui}, b)-0(Af(T) u{m, ■.. u<- i},b) 

k 

< Y 0(N(S) U {ui,... Ui}, b) — C(Af(S) U{til,... m-i}, b) 

2 = 1 

= ©(WIS 1 ) u R!, b) - 0(M(S), b) 

where the inequality comes from the submodularity of 0(-,b ) 
proved in Lemma 3. This same function is also obviously 
set-increasing, hence: 

0(Af(S) U R!, b) - 0(Af(S), b) 

< 0(AT(S) U R, b) - 0(Af(S), b) 

= 0(Af(S U {x}), b) - 0(AT(S), b) 
This concludes the proof of the proposition. □ 

Proof of Proposition 4. We simply note that the con¬ 
tent of the outer for loop on line 2 of Algorithm 1 is the 
greedy submodular maximization algorithm of [23]. Since 
0(AT(-),t) is submodular (Proposition 3), this solves the in¬ 
ner max in (10) with an approximation ratio of (1 — 1/e). 
The outer for loop then computes the outer max of (10). 

As a consequence, Algorithm 1 computes a (1 — 1/e)- 
approximate non-adaptive solution. We conclude by apply¬ 
ing Proposition 2. □ 

B.3 Parallelization 

As discussed in the body of the paper, the algorithm can 
be parallelized across k different machines, each one com¬ 
puting an approximation for a fixed budget k — t in the first 
stage and t in the second. A slightly more sophisticated 
approach is to consider only logn splits: (1 ,k — 1), (2 ,k — 
2 ),..., (2L log "J, 1) and then select the best solution from this 
set. It is not hard to see that in comparison to the previous 
approach, this would reduce the approximation guarantee 
by a factor of at most 2: if the optimal solution is obtained 
by spending t on the first stage and k — t in the second 
stage, then since t < 2 • 2^ lost J the solution computed for 
( 2 L log ; h _ 2 L lo sfi) will have at least half that value. More 
generally, for any e > 0 one can parallelize over log 1+£ n 
machines at the cost of losing a factor of (1 + e) in the ap¬ 
proximation guarantee. 

B.4 Implementation in MapReduce 

As noted in Section 4.2, lines 4 to 7 of Algorithm 1 corre¬ 
spond to the greedy heuristic of [23] applied to the submod¬ 
ular function ft(S) = 0(Af(S),t). A variant of this heuris¬ 
tic, namely the e-greedy heuristic, combined with the Sam- 
ple&Prune method of [16] allows us to write a MapReduce 
version of Algorithm 1. The resulting algorithm is described 
in Algorithm 2 

We denoted by V ft(S,x) the marginal increment of x to 
the set S for the function ft, V ft(S,x) = ft(SLl{x}) — ft(S). 
A is an upper bound on the marginal contribution of any 
element. In our case, A = max u6 y(x) w u provides such an 
upper bound. The sampling in line 7 selects a small enough 


Algorithm 2 Combinatorial algorithm, MapReduce 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 
23 


S^ 0 

for t as 1 to k — 1 do 

S t <^0 

for i = 1 to log 1+£ A do 
U <- X, S' <- 0 

while | C/| > 0 do 

7? sample from 17 w.p. min ^1, 

while |fl| > 0 or |St U S'| < k do 
x some element from R 
if Vft{St U S',*) > then 

S' <r- S' U {*} 

end if 

R <— R \ {*} 

end while 

St^St u s' 

u*-{xeu\vft(S t ,x)> 

end while 
end for 

if 0{Af{S t ),t) > 0(Af(S), k — |S|) then 
S^St 

end if 
end for 
return S 


A 

(l+e) i 


} 


number of elements that the while loop from lines 8 to 14 
can be executed on a single machine. Furthermore, lines 
7 and 16 can be implemented in one round of MapReduce 
each. 

The approximation ratio of Algorithm 2 is 1 — y — e. The 
proof of this result as well as the optimal choice of £ follow 
from Theorem 10 in [ 6]. 








